Prosodic parallelism as a cue to repetition and error correction disfluency

نویسندگان

  • Jennifer Cole
  • Mark Hasegawa-Johnson
  • Chilin Shih
  • Heejin Kim
  • Eun-Kyung Lee
  • Hsin-Yi Dora Lu
  • Yoonsook Mo
  • Taejin Yoon
چکیده

Complex disfluencies that involve the repetition or correction of words are frequent in conversational speech, with repetition disfluencies alone accounting for over 20% of disfluencies. These disfluencies generally do not lead to comprehension errors for human listeners. We propose that the frequent occurrence of parallel prosodic features in the reparandum (REP) and alteration (ALT) intervals of complex disfluencies may serve as strong perceptual cues that signal the disfluency to the listener. We report results from a transcription analysis of complex disfluencies that classifies disfluent regions on the basis of prosodic factors, and preliminary evidence from F0 analysis to support our finding of prosodic parallelism. 1. Acoustic-prosodic correlates of disfluency Disfluency occurs in spontaneous speech at a rate of about one every 10-20 words, or 6% per word count [17], yet this interruption of fluent speech does not generally lead to comprehension errors for human listeners. Recent research has shown that important cues to disfluency can be found in the syntactic and semantic structures conveyed by the word sequence, and in the phonological and phonetic structures signaled by acoustic features local to the disfluency interval. These cues identify the components of the disfluent region --the reparandum (REP), edit phrase (EDT), and alteration (ALT) --and their junctures. Work on automatic disfluency detection has shown that the most successful approach combines both lexical and acoustic features, with explicit models of the lexical-syntactic and prosodic features that pattern systematically with disfluent intervals [1,6]. Of the acoustic-prosodic correlates of disfluency, the postreparandum pause (filled or unfilled) has been studied the most extensively. Nakatani & Hirschberg’s [12] detailed acoustic and classification studies examine duration, F0 and energy, and also report unusual patterns of lengthening, coarticulation, and glottalization near the interruption point of a disfluency. In this paper we examine the nature of prosodic correlates of disfluency in the characteristic patterns of F0, duration and energy that identify and distinguish among various types of disfluency involving word repetition and error correction. There are distinct types of disfluency that can be characterized in terms of their form and function. Shriberg [16, 17] classifies the disfluencies of the Switchboard corpus into six categories: filled pause ("uh" and "um"), repetition (of one or more words, without correction), substitution (repetition of zero or more words, followed by the correction of the last word in the disfluent interval), insertion, deletion, and speech error. Other work identifies abandonment (fresh start) disfluencies, in addition [6,11,18]. These distinct types of disfluency may be caused by different psychological processes. Levelt [9] suggests that corrections of a single word may result from monitoring of the phonetic plan, while corrections that involve repair or abandonment of an entire phrase may result from monitoring of the pre-syntactic message. Clark & Fox Tree [3] and Clark & Wasow [4] propose a different psychological account for filled pause and repetition disfluencies. In these accounts filled pauses like “uh” and “um” are phonological words that are used by the speaker to signal a delay in the preparation of the upcoming speech. Repetition disfluencies occur when the speaker makes a premature commitment to the production of a constituent, perhaps as a strategy for holding the floor, and then hesitates while the appropriate phonetic plan is formed. The continuation of speech is marked by “backing up” and repeating one or more words that precede the hesitation, as a way of restoring fluent delivery. Henry & Pallaud [7] support the findings of Clark & Wasow [4] by demonstrating that morphological, syntactic, and structural features strongly differentiate repetition disfluencies from word fragment disfluencies. Clark & Wasow [4] note that repetition disfluencies are four times as common as repair disfluencies; they suggest that a small number of repetition disfluencies may be "covert repairs" [9], but that most repetitions are more closely related to filled pause disfluencies than to speech repairs. The acoustic-prosodic features that serve to cue disfluency vary according to the type of disfluency. Levelt & Cutler [10] observe a contrastive emphasis on the repair segment of an error-correcting disfluency, manifest in increased F0, duration and amplitude. Shriberg [15] and Plauché & Shriberg [13] find that F0 contours, word durations, and the distribution of pauses serve to differentiate among three types of repetition disfluencies. Shriberg [15] describes repetition disfluencies that signal covert repair as having a characteristic reset of the F0 contour to a high, phrase-initial value at onset of the alteration. Similarly, Savova & Bachenko [14] propose an “expanded reset rule,'' according to which “alteration onsets are dependent on both reparandum onsets and reparandum offsets,” echoing the observation of Shriberg [15] that when speakers modify the duration of a repeated word in a repetition disfluency, “they tend to do so in a way that preserves intonation patterns and local pitch range relationships.” In our study of prosody and disfluency in the Switchboard corpus of conversational telephone speech, we observe parallelism in the prosodic features of the REP and ALT phases as characteristic of repetition and error correction disfluencies. Highly similar F0 patterns express a parallel intonation structure that cues the relationship between the REP and ALT for the majority of repetition and error correction disfluencies we have observed. We propose an extended typology of repetition disfluencies in this paper, based on prosodic comparison of REP and ALT. Section 2 describes the methods of our transcription study of disfluency in Switchboard, and section 3 presents frequency data on five sub-categories of repetition and error correction disfluency that are prosodically distinguished based on a comparison of the prosodic features of the REP and ALT intervals. Section 4 reports on preliminary quantitative evidence from F0 data that support our analysis based on perceptual transcription.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prosodic parallelism as a cue to repetition disfluency

Repetition disfluencies are among the most frequent type of disfluency in conversational speech, accounting for over 20% of disfluencies, yet they do not generally lead to comprehension errors for human listeners. We propose that parallel prosodic features in the REP and ALT intervals of the repetition disfluency provide strong perceptual cues that signal the repetition to the listener. We repo...

متن کامل

Automatic disfluency identification in conversational speech using multiple knowledge sources

Disfluencies occur frequently in spontaneous speech. Detection and correction of disfluencies can make automatic speech recognition transcripts more readable for human readers, and can aid downstream processing by machine. This work investigates a number of knowledge sources for disfluency detection, including acoustic-prosodic features, a language model (LM) to account for repetition patterns,...

متن کامل

A comparison of disfluency patterns in normal and stuttered speech

While speech disfluencies are commonly found in every speaker’s speech, stuttering is a language disorder characterized by an abnormally high rate of speech aberrations, including prolongation, cessation, and repetition of speech segments [5]. However, despite the obvious differences between stuttered and normal speech, identifying the crucial qualities that identify stuttered speech remains a ...

متن کامل

A Study of the Role of Repetition of Music in the Nimayee Poems of Akhavan Sales

In this article, the aim of the authors is to study the role of repetition in creating music and parallelism in Nimayee poems in the three collections of poems by Mehdi Akhavan Sales: Zamestan, Akhere Shahname and Az Avesta. Accordingly, the researchers have investigated the various manners of repetition in creating parallelism in the poems at three levels: phonological parallelism, lexical par...

متن کامل

Prosodic contex-based analysis of disfluencies

This work explores prosodic cues of disfluencies in a corpus of university lectures. Results show three significant (p < 0.001) trends: pitch and energy slopes are significantly different between the disfluency and the onset of fluency; those features are also relevant to disfluency type differentiation; and they do not seem to be a speakereffect. The best combination of linguistic features one...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005